04 Resampling ¶
Right click to download this notebook from GitHub.
In the case of the images that we have loaded so far, all the data have the same resolution (30m). In the previous tutorial we saw that it is straightforward to align these datasets even though they cover slightly different areas. In some cases though the resolution of the image is different. This is the case for band 8 (the pancromatic band). The resolution of that band is 15m. This section will demonstrate aggregation (down-sampling) and interpolation (up-sampling). In practice, aggregation is much more common.
import intake
import numpy as np
import xarray as xr
import holoviews as hv
import cartopy.crs as ccrs
import geoviews as gv
import hvplot.xarray
hv.extension('bokeh', width=80)
We'll be using the
datashader
operations
rasterize
and
regrid
to handle our multidimensional regridding.
from holoviews.operation.datashader import regrid, rasterize
from datashader import transfer_functions as tf, reductions as rd
Recap: Loading data ¶
cat = intake.open_catalog('../catalog.yml')
l5_da = cat.l5().read_chunked()
l8_da = cat.l8().read_chunked()
crs = ccrs.epsg(32611)
Compute NDVI ¶
Now we will calculate NDVI for each of these image sets.
NDVI_1988 = (l5_da.sel(band=5) - l5_da.sel(band=4)) / (l5_da.sel(band=5) + l5_da.sel(band=4))
NDVI_2017 = (l8_da.sel(band=5) - l8_da.sel(band=4)) / (l8_da.sel(band=5) + l8_da.sel(band=4))
Aligning the data ¶
NDVI_by_year = xr.concat([NDVI_1988, NDVI_2017], dim=xr.DataArray([1988, 2017], dims=('year'), name='year'))
NDVI_by_year
Select region of interest ¶
We'll use the area around the central point as the Region of Interest (ROI). In this case we'll use a 30 km box around the center point.
x_center, y_center = crs.transform_point(-118.7081, 38.6942, ccrs.PlateCarree())
buffer = 1.5e4
xmin = x_center - buffer
xmax = x_center + buffer
ymin = y_center - buffer
ymax = y_center + buffer
ROI = NDVI_by_year.sel(x=slice(xmin, xmax), y=slice(ymin, ymax))
p = ROI.hvplot('x','y', col='year', crs=crs, shared_axes=True, width=400, height=300, cmap='viridis')
Aggregation ¶
We'll define a new resolution that is visibly different from 30m.
res = 1e3
Just to make things pretty and as a sanity check, let's turn the colorbar back on for both plots and set the width of the first plot slightly higher to account for the extra axis that is being portrayed.
p[1988] = p[1988].options(width=370, height=300)
p[2017] = p[2017].options(width=310, height=300)
p_1000 = regrid(p, x_sampling=res, y_sampling=res)
p_1000
Notice how fast it was to generate these plots that is because the resolution is low. Aggregation is by mean by default for gridded data, but there are other ways to aggregate. For instance, we can aggregate by the standard deviation of the grid cells that are aggregated into each pixel:
regrid(p, x_sampling=res, y_sampling=res, aggregator=rd.std()).relabel(f'Aggregated by std')
# Exercise: Try to regrid using a different aggregator. Use tab completion to find other methods on `rd`.
# Exercise: Now using std, try changing the resolution.
Here, the
std
values are high when the displayed pixel includes grid cells with a wide range of different NDVI values, which is clearly true around the bounds of the lake in 2017.
Similar workflow in
xarray
¶
The above computations used Datashader, but we can accomplish a similar thing in
xarray
by grouping the values into bins based on the desired resolution and taking the mean on each of those bins.
res = 1e3
x = np.arange(ROI.x.min(), ROI.x.max(), res)
y = np.arange(ROI.y.min(), ROI.y.max(), res)
We'll use the left edge as the label for now.
da_1000 = (ROI
.groupby_bins('x', x, labels=x[:-1]).mean(dim='x')
.groupby_bins('y', y, labels=y[:-1]).mean(dim='y')
.rename(x_bins='x',y_bins='y')
)
da_1000
# Exercise: Try to aggregate using a method other than mean.
Compare ¶
We can compare this to the results from using datashader regridding by getting the data from p_1000 and subtracting the nearest data from da_1000.
def get_data(p):
df = p.dframe()
pivotted = df.pivot(index='y', columns='x', values='value')
stacked = pivotted.stack()
return xr.DataArray.from_series(stacked)
(da_1000.sel(year=2017).reindex(get_data(p_1000[2017]).indexes, method='nearest') -
get_data(p_1000[2017])).hvplot('x','y')
# Exercise: Why are the top and right edges particularly bad, and how would you fix that?
Handling bands with different resolutions ¶
First we need to load the band-8 data. We'll grab it straight from s3. We will get the 8th band at the same path and row and same time. You can inspect data availability for this scan from landsatonaws.com .
da_8 = cat.amazon_landsat_band(product_id='LC08_L1TP_042033_20171022_20171107_01_T1', path=42, row=33, band=8).to_dask()
We can slice out our ROI, but we have to be careful to specify the max and min in the appropriate order for each coordinate (max:min for the y coordinates (as they are in decreasing order), min:max for the x coordinates (as they are in increasing order)).
ROI_8 = da_8.sel(x=slice(xmin, xmax), y=slice(ymax, ymin))
ROI_8 = ROI_8.drop('band').squeeze().persist()
ROI_8
p_8 = ROI_8.hvplot('x', 'y', width=500, height=400)
p_8